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Abstract 

World-set algebra is a variable-free query language for uncertain databases. It con- 
stitutes the core of the query language implemented in MayBMS, an uncertain database 
system. This paper shows that world-set algebra captures exactly second-order logic 
over finite structures, or equivalently, the polynomial hierarchy. The proofs also imply 
that world-set algebra is closed under composition, a previously open problem. 

1 Introduction 



Developing suitable query languages for uncertain databases is a substantial research chal- 
lenge that is only currently starting to get addressed. In previous work [3j, we have de- 
veloped a query language in the spirit of relational algebra for processing uncertain data 
- world-set algebra (WSA). WSA consists of the operations of relational algebra plus two 
further operations, one to introduce uncertainty and one to compute possible tuples across 
groups of possible worlds. WSA is implemented in the MayBMS system [31 O [T01I9]. 

It remains to obtain an understanding of the complexity and expressive power of world- 
set algebra. The main result of this paper is a proof that world-set algebra over uncertain 
databases consisting of finite sets of possible worlds (each one a relational database) pre- 
cisely captures second-order logic (SO) over finite structures, or equivalently, the polynomial 
hierarchy. This seems to be a somewhat surprising coincidence, since the language was not 
designed with this result as a goal but by abstraction from a set of use cases from the 
contexts of hypothetical ("what-if") queries, decision support queries, and data cleaning. 
Viewed differently, WSA is a natural variable- free language equivalent to SO; it is to SO 
what relational algebra is to first-order logic. To the best of the author's knowledge, no 
other such language is known. 

The fact that WSA exactly captures second-order logic is a strong argument to justify 
it as a query language for uncertain data. Second-order logic is a natural yardstick for 
languages for querying possible worlds. Indeed, second-order quantifiers are the essence of 
what-if reasoning about databases. World-set algebra seems to be a strong candidate for 
a core algebra for forming query plans and optimizing and executing them in uncertain 
database management systems. 

It was left open in previous work whether world-set algebra is closed under composition, 
or in other words, whether definitions are adding to the expressive power of the language. 
Compositionality is a desirable and rather commonplace property of query algebras, but 



in the case of WSA it seems rather unlikely to hold. The reason for this is that the alge- 
bra contains an uncertainty-introduction operation that on the level of possible worlds is 
nondeterministic. First materializing a view and subsequently using it multiple times in 
the query is semantically quite different from composing the query with the view and thus 
obtaining several copies of the view definition that can now independently make their non- 
deterministic choices. In the paper, evidence is given that seems to suggest that definitions 
are essential for the expressive power of WSA. 

The paper nevertheless gives a proof that definitions do not add to the power of the 
language, and WSA is indeed compositional. In fact, there is even a (nontrivial) practical 
linear-time translation from SO to WSA without definitions. This result, and the techniques 
for proving it, may also be relevant in other contexts. For example, it is shown that self- 
joins essentially can always be eliminated from classical relational algebra at the cost of 
introducing difference operators. 

The proofs also imply that WSA is complete for the polynomial hierarchy with respect 
to data complexity and PSPACE-complete with respect to combined complexity [15U14J . 

For use as a query language for probabilistic databases, WSA has been extended very 
slightly by a tuple confidence computation operation (see e.g. [9|). The focus of this pa- 
per is on the nonprobabilistic language of [3]. For the efficient processing of queries of 
this language, the confidence operation is naturally orthogonal to the remaining operations 
2. 10, 9|. The expressiveness and complexity results obtained in the present paper consti- 
tute lower bounds for the probabilistic version of the language. But the non-probabilistic 
language is interesting and important in its own right: Many interesting queries can be 
phrased in terms of the alternatives possible in a data management scenario with uncer- 
tainty, without reference to the relative (probability) weights of these alternatives. 

The structure of this paper is as follows. Section [2] establishes the connection between 
second-order logic and uncertain databases. Section[3]introduces world-set algebra and gives 
formal definitions of syntax and semantics. Section 0] proves that WSA exactly captures 
the expressive power of second-order logic over finite structures. These proofs assume the 
availability of a construct for making definitions (materializing views). Section [5] discusses 
the importance of being able to compose these definitions with the language, and shows 
why it should seem rather surprising that definitions are not needed for capturing second- 
order logic. Section [6] finally proves that definitions can indeed be eliminated without loss 
of expressive power, and a construction for composition is given. We obtain from these 
results that WSA with or without definitions is complete for the polynomial hierarchy with 
respect to data complexity and PSPACE-complete with respect to combined complexity. 
We discuss related work in Section [7] and conclude in Section [51 

2 Uncertain Databases 

The schema of a relational database is a set of relation names together with a function sch 
that maps each relation name to a tuple of attribute names. We use calligraphic symbols 
such as A for relational databases. The arity \sch(R)\ of a relation R is denoted by ar{R). 
We will use the standard syntax of second-order logic (SO) (see e.g. [II]). Its semantics 
is defined using the satisfaction relation \=, as usual. Throughout this paper, we will only use 
second-order logic relativized to some finite set of domain elements (say, D), as is common 
in finite model theory (cf. [H]). That is, first-order quantifiers 3x (ft are to be read as 
3x D(x) A (j) and second-order quantifiers 3R (f> are to be interpreted as 3R R C D ar{ y > A <fi. 



An uncertain database over a given schema represents a finite set W = {A\, . . . , A n } of 
relational databases of that schema, called the possible worlds. One world among these is 
the true world, but we do not know which one. 

A representation for a finite set of possible worlds W over schema (R±, . . . ,Rk) is a 
pair of a relational database schema and a formula to over that database schema with free 
second-order variables R\ , . . . , Rk and without free first-order variables such that to is true 
on exactly those structures that are in W: 

(i?i,...,i4)Nu, & (Ri,...,R k )eW. 

Example 2.1 (Standard Representation) Consider a representation of an uncertain 
database by relations that associate with each tuple a local condition in the form of a 
conjunction of propositional literals. A possible world is identified by a truth assignment 
for the propositional variables used, and a tuple is in a possible world if the world's truth 
assignment makes the tuple's clause true. 

A representation database consists of a set V of propositional variables, a relation L 
such that L(c,p, 1) is true iff variable p occurs positively in conjunction c and L(c,p, 0) is 
true iff variable p occurs negated in c, and a representation relation R[ for each schema 
relation Ri which extends the schema of Ri by a column to associate each tuple with a 
conjunction. 

Possible worlds are identified by subsets P C V of variables that are true. A tuple t is 
in relation Ri in possible world P if R'At, c) is true and conjunction c is true for the variable 
assignment that makes the variables in P true and the others false. 

The representation formula to{R\, ■ ■ ■ ,Rk) is 

k 

3P P C V A /\ VtR t (t) & 3c R'^t, c) A Vp {L(c,p, 0) =► ->P(p)) A {L(c,p, 1) => P(p)). 
i=l 

This is the representation system that is essentially used in MystiQ [5j, Trio [4], and 
MayBMS [2j. It is a special case of c-tables [7] in which local conditions are in DNF, there 
is no global condition, and no variables occur in the data tuples themselves (just in the 
local conditions associated with the data tuples). Note that it is complete in the sense that 
it can represent any nonempty finite set of possible worlds. Moreover, it is succinct, i.e., 
the cardinality of the represented set of possible worlds is in general exponential in the size 
of the representation database. □ 

It is now easy to use second-order logic for expressing queries on uncertain databases 
encoded by a representation. For instance, query <f) is possible if 3R\ ■ ■ ■ R/. uA(f) and certain 
if Vi?i ■ • • Rk oj =3~ <f>. Second-order logic allows us to use succinct representations, but also 
yields very powerful hypothetical queries that can ask questions about possible choices of 
sets of tuples. Such a choice of sets could be e.g. clusters of tuples in record matching (also 
known as deduplication and under many other names). 

3 The Algebra 

3.1 Syntax and Semantics 

World-set algebra (WSA) consists of the operations of relational algebra (selection a, pro- 
jection 7r, renaming p, product x, union U, and difference — ), two additional operations 



repair- key and possible^*, and definitions "let R := Q in Q'" where R is a new relation 
symbol that may be used in Q' . WSA without definitions is the set of WSA queries in 
which no let-expressions occur. 

Conceptually all operations are evaluated in each possible world individually. The 
operations of relational algebra are evaluated within possible world A in the normal way. 
Given input relation R, repair-key ^-(-R) nondeterministically chooses a maximal repair of 
the functional dependency A — ► sch(R) on R, that is, it returns a subset R' of R in which 
A is a (super)key such that there is no superset of R' which is a subset of R and in which 
A is a (super)key The operation possible^(Q) is the only operation that can look into 
alternative possible worlds. It computes, for the current possible world given by A, the 
set of possible tuples occurring in the results of Q across the group of possible worlds that 
agree with A on ir^(Q). Definitions (statements "let R := Q in Q'") extend A by a named 
relation R defined by query Q. Since Q is nondeterministic in general, the overall set of 
possible worlds on which Q' runs (which is relevant for computing possible r) may increase. 

Formally, the semantics of world-set algebra is defined using a translation [-]^ such 
that for a context of a set of possible worlds W and a world A £ W, R is a possible result 
of world-set algebra query Q iff R G [Q]^: 

[{*}]& := {{*}} 

t constant tuple 

\FBfo := {R A } 

P(Q)]& ■= MR) I R g IQ1&} 

... 6 e {o-<p,tVa, pa^b} 

IQi Q 2 ]^ := {R, 9R 2 \R 1 e [Qi]&, R 2 € [Q 2 |^} 
... #G{x,U,-} 

[repair-key^Q)]^ := {R'\R^CRG [Q}$r,* x (R) = n A (R'), 

A is a key for Rq} 

[possible^Q)]^ : = { \J {R> | B G W, R' G [Q]&, 

n I (R) = w I (R')}\R£lQ}&} 
[let fl := Q in Q']& := { [g /](£H) , ^ e jqj^} 

where W = {(£, #) | B € W, # G [Q]&}. 

Queries are run against an uncertain database W, and \Q\w gives the result of Q seen 
in possible world „4 of W. Using possible0, we can close the possible worlds semantics and 
ask for possible (or, using difference, certain) tuples. For such queries A can be chosen 
arbitrarily (and the semantics function can be considered to be of the form |Q]w). 

Definitions in subexpression are unaffected by the operations higher up in the expres- 
sion tree and can be pulled to the top of the expression without modification. This is a 
direct consequence of the following fact, where we assume that may be any of the WSA 
operations. (Thus < k < 2 and for possible^*, k = 2.) 

Proposition 3.1 For arbitrary WSA queries Q, 0(Qi, ■ ■ ■ ,Qk), ifV occurs only in Qi, 

0(Qi, ..., Qi-i, (let V :=Q in Qi), Q i+1 , ..., Q k ) = (let V := Q in 0(Q 1 , ..., Q k )) . 



Proof. It can be shown by an easy induction that for any Q, {Q} w ' = {QJw' wnere 
W' = {A | (.4, V) G W} if relation name V does not appear in Q. This is immediate for all 
operations other than possible ?. Let Q = possible AQ 1 ) and let the induction hypothesis 

hold for Q', i.e., lQ'j^ V) = \Q%,. Then 

[possible 1 (Q')]^ V ' ) = { (J { R> I &> V ') £W,R'€ lQ'fw V '\ 

K A (R) = 7r A (R')}\RelQ'] { w' V) } 

= { U i R> I Y E W '> R ' G IQ%'^a( r ) = «a( r ')} I R e Ww) 

= [possible^QOliy'- 
Now we apply the fact just proven to the subqueries Qj for j ^ i. By definition, 

lletV:=Qm6(Q 1 ,...,Q k )}&, = {l6(Q 1 ,...,Q k )}$' V) \VElQ}^}- 
We distinguish between the various operations 9. For relational algebra, 

ie(Qu...,Qk)fw' V) = {e(R 1 ,...,R k )\f\R J £lQ j }tt' v) } 

3 

= [9{R 1 , ...,R k )\RiE [Qi}$' v) , j\Rj e lQj}&,} 

because V only occurs in Qi and [Qj]^/ = [Qj]^-/ for j ^ i. Thus 
lletV:=Qm6(Q 1 ,...,Q k )}&, = \e(R u . . . ,R k ) \ R* E lQi}$ V) ,V € [Q]&„ 

R t e{lct V:=Q in Qi}-^, 



j¥* 



A 



= {0(Qi, ■■■ , Qi-i, (let V := Q in Qi), Q i+1 , ..., Q k)m , 

The proof for the remaining operations proceeds similarly. □ 

In other words, they can be considered "global" . That is, without loss of generality we 
could assume that each WSA query is of the form 

let Vi := Qi in (• • • (let V k := Q k in Q) ■ ■ ■ ) 

where Q does not contain definitions. 

Observe that in the case of binary relational algebra operations 9, the set of possible 
worlds \Q\ 9 Q2\w ls obtained by pairing relations in the results of [Qijjy and [Q2]y^- 
This is consistent with the intuition that 9 is applied to possible worlds B that contain two 
relations R± and i?f an d the result in B is Rf 9 i?f : Proposition 13. II implies that 

0(Qi,...,Qk) = {letV 1 :=Q 1 ,...,V k :=Q k in9(V 1 ,...,V k )). 

As a convention, we use {()} to represent truth and to represent falsity, over a miliary 
relation schema. 



Example 3.2 Given a relational database with relations V(V) and E(From, To) repre- 
senting a graph (directed, or undirected if E is symmetric). Then the following WSA query 
Q returns true iff the graph is 3-colorable: 

let R := repair-key sch( y) (V x p c {{r} U {g} U {b})) in 

pOSSible ({()} - nq ) (a 1 .v=2.PromA2.To=3.VAl.C=3.c(R x E x R))). 

The possible relations R are all the functions V — ► {r, g, b}, and Q simply asks whether 
there is such a function R such that there do not exist two adjacent nodes of the same 
color. 

The corresponding SO sentence is 

3R <l>R:V->{r,g,b} A ->3u, v, c i?(u, c) A i2(u, v) A i?(v, c) 

where <t)R-.v^{r,g,b} is a first-order sentence that states that R is a relation C V x {r, g, 6} 
that satisfies the functional dependency R : V — > {r, g, 6}. □ 

3.2 Derived Operations: Syntactic Sugar 

We will also consider the following operations, which are definable in the base language: 
[subset(Q)]^ := {R> | R> c R G [Q]^} 

[choice-of^Q)]^ := {7r 1= -(i?) | i? G [Q]&, a G ^(i?)} 
[certain^Q)]^ := { Q {# | B G W,R> G [Q]*, 

Ipossible(Q)]^ := { |J {# | fi G W, R G [Q]^}} 

[certain(Q)]^ := {f]{R\ B GW,R e Mw}} 
The operation subset nondeterministically chooses an arbitrary subset of its input rela- 
tion. The operation choice-of^(-R) nondeterministically chooses an a G tt^(R) and selects 
those tuples t of R for which t.A = a. Conceptually, the operations subset and repair- key 
cause an exponential blowup of the possible worlds under consideration: for instance, on 
a certain database (i.e., consisting of a single possible world) subset(-R) creates the pow- 
erset of relation R as the new set of possible worlds. The operation certain r is the dual 
of possible j* and computes those tuples common to all the worlds that agree on ir^. The 
operations possible and certain compute the possible respectively certain tuples across all 
possible worlds. 

Proposition 3.3 The operations subset and possible are expressible in WSA without defi- 
nitions. The operations choice-ofx, certain?, and certain are definable in WSA with defi- 
nitions. 

Proof Sketch. The result is an immediate consequence of the following equivalences. 

choice-of^(i?) = R x repair-key0("7r^(i?)). 

certain^(Q) = Q — possible^(possible^*(Q) — Q) 

subset (.R) = 7r sc/l(jR )(crA=i(repair-key sch(R) (i? x pa({0, 1})))) 
(w.l.o.g., A sch(R)). 

possible(Q) = possibk^Q) 

certain(Q) = certain0(Q) 
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Figure 1: Database (a) and intermediate query results (b-d) of Example 13.51 



The expression possibleg((5) computes the possible tuples of those worlds in which the result 
of Q in nonempty. But, obviously, in the remaining worlds there are no tuples to collect. 
By the definition of certain^ in terms of possible^, the definition of certain is correct too. 

□ 

Remark 3.4 The operation repair-key is also definable using the base operations without 
repair-key plus subset; however, such a definition seems to need let-statements, while the 
definition of subset using repair-key does not. 

In [3], it was shown that the fragment obtained from WSA by replacing repair- key 
by choice-of is a conservative extension of first-order logic. That is, every query of that 
language that maps from a single possible world to a single possible world is equivalent to 
a first-order query. It is not surprising that this is not true for full WSA. 

3.3 A Hypothetical Query Processing Example 

Example 3.5 Consider the relational database of Figure (TJ^a) which represents employees 
working in companies and their skills. The query, a simplified decision support problem, 
will be stated in four steps. 

1. Suppose I choose to buy exactly one company and, as a consequence, exactly one 
(key) employee leaves that company. 

U := choice_ofc,£: (Company _Emp) 

(This nondeterministically chooses a tuple from Company _Emp.) 



2. Who are the remaining employees? 

V := tti.c,2.e(U b<i.c=2.CAi.E^2.E Company_Emp) 

3. If I acquire that company, which skills can I obtain for certain? 

W := certainc(vrc i 5(T / M Emp_Skills)) 

(This query computes the tuples of V X Emp_Skills that are certain assuming that 
the company was chosen correctly - i.e., certain in the set of possible worlds that 
agree with this world on the C column.) 

4. Now list the possible acquisition targets if the gain of the skill s\ shall be guaranteed 
by the acquisition. 

possible(7rc ((T5 =S1 (W))) 

Figure [U(b-d) shows the development of the uncertain database through steps 1 to 
3. The first step creates five possible worlds corresponding to the five possible choices of 
company and renegade employee from relation Company _Emp. Steps two to four further 
process the query, and the overall result, which is the same in all five possible worlds, is 



Result 



C 



ci 

□ 

4 WSA with Definitions Captures SO Logic 

In this section, it is shown that WSA with definitions has exactly the same expressive power 
as second-order logic over finite structures. 

Theorem 4.1 For every SO query, there is an equivalent WSA query with definitions. 

Proof. We may assume without loss of generality that the SO query is a first-order query 
prefixed by a sequence of second-order quantifiers. The proposition follows from induction. 

Induction start: FO queries can be translated to relational algebra by a well-known 
translation known in the database context as one direction of Codd's Theorem (cf. pQ). 

Induction step (second-order existential quantification, 3i?fc + i(C D l ) (/>): Let cj) be an SO 
formula with free second-order variables R\, . . . , Rk+i and free first-order variables x where 
Rk+i has arity I. Let Q^ be an equivalent WSA expression. Without loss of generality, we 
may assume that the relations R\, . . . ,Rk, Q$ have disjoint schemas. Let 

Q := (let R k+1 := subset(£> z ) in ir S ch(Q)(possMe sch{Rl) _ sch{Rk) (l Rl x ■ • ■ x l Rk x Q$))). 

where 1^ = R { x {1} U (D ar( ~ R ^ - R,,) x {0}. (Note that the relations l Ri will play a 
prominent role in later parts of this paper.) We prove that 

(/2i,...,fl fc> 5)l=3i2 fc+1 (C£> , )0 e> xGR Q 
where {R Q } = [Q]^ 1 '"-^. By definition of [•], 



where W = {(Ri, ■ ■ ■ ,-Rfe+i) | (Ri,...,Rk) G W,R k+ i C P>'} and Q' is a shortcut for 
possible sch{Rl) sch{Rk) (l Rl x-xlfljX Q^). 

We may assume a nonempty domain L>, so the result of l Pl x • • • x l Rk is never empty, 
the mapping (Pi, . . . , P&) i— > l Rl x • • • x l^ fe is injective, and Q will therefore group the 
possible outcomes of Q^ for the various choices of Rk+i by R\, . . . , R k . 

Formally, by definition of [•], 

[01 c*,...hm.i) = { (j {lRi x . . . x lRk x iQ ^r A ^ ] i 

(Pi, . . . , R k , R' k+1 ) €W'}\(R 1: ..., R k+1 ) e W 1 ] 

= {lib x • • • x 1* x |J { W^"" A,BUl) I <+i C £>'}}■ 

Thus, in a given world (Pi, . . . , P^), Q produces exactly one world as the result, 

iQi&'-' Rk) = { u {iQ£'"'" Rh,R ' k+l) I *Ui ^d 1 }} = {Rq} 

and this captures exactly second-order existential quantification. 

The WSA expression for universal second-order quantifiers VPfc + i(Cj D l ) <f> is similar. 
Alternatively, VP^+i (j) can also be taken as —^R k -\.\ -x/>, where complementation with 
respect to D is straightforward using the difference operation. □ 

Example 4.2 X 2 -QBF is the following S^-complete decision problem. Given two disjoint 
sets of propositional variables V\ and V2 and a DNF formula cp over the variables of V\ and 
V2, does there exist a truth assignment for the variables V\ such that <p is true for all truth 
assignments for the variables V2? 

Instances of this problem shall be represented by sets V\ and V2, a set C of ids of 
clauses in </>, and a ternary relation L(C, P, S) such that (c,p, 1} € L (resp., {c,p, 0) G L) iff 
propositional variable p occurs positively (resp., negatively) in clause c of cfi, i.e., 

<t> = V A p A A _n p- 

ceC{c,p,i)eL (c,p,o)eL 

The QBF is true iff second-order sentence 

3Pi (Pi C Vi) A VP 2 (P 2 C V2) =► V 

is true, where ^ is the first-order sentence 

3c --3p (L(c,p,0) A (Pi(p) V P 2 (p))) V (L(c,p, 1) A -.(Pi(p) V P 2 (p))). 

which asserts the truth of </>: that there is a clause c in of which no literal is inconsistent 
with the truth assignment p 1— > (p E Pi U P 2 ). By Theorem 14. H this can be expressed as 
the Boolean WSA query 

let Pi := subset (Vi) in possible({(}} 

— let P 2 := subset (V2) in possible sch ( Pl )(lp 1 x ({(}} — Q))) 

where 

Q = n(C - 7T C {(a s =o(L) m (Pi U P 2 )) U (a s =i(L) M ((Vi U V 2 ) - (Pi U P 2 ))))) 

is relational algebra for tp. □ 



For the converse result, we must first make precise how second-order logic will be 
compared to WSA, since second-order logic queries are usually not "run" on uncertain 
databases. We will consider WSA queries that are evaluated against a (single- world) re- 
lational database A representing an uncertain database (e.g., using the standard repre- 
sentation of Example 12. ip . We already know that arbitrary uncertain databases (that is, 
nonempty finite sets of possible worlds) can be so represented, and this assumption means 
no loss of generality. The query constructs the uncertain database from the representation 
and is always evaluated as [Qj/Vp precisely as sketched at the end of Section [2j 

Theorem 4.3 For every WSA query, there is an equivalent second- order logic query. 

Proof Sketch. The proof revolves around the definition of a function [-] so that maps each 
WSA expression Q to an SO formula [Q] so with free second-order variables R and Rq and 
without free first-order variables such that {QJso an d Q are equivalent in the sense that 
IQJso is true on structure (A, R, Rq) iff Rq is among the possible results of Q starting from 
possible world (A,R). We can state this notion of correctness, which is the hypothesis of 
the following induction along the structure of the WSA expression, formally as 



{A,R,Rq)\=IQ} 80 & R Q e\Qj 



(A,R) 
W 



for 

W = {(A,R)\(A,R)\= /\ Vv}. 

V inR 

Here the free second-order variables R are also the names of the views defined (using let- 
expressions) along the path from the root of the parse tree of the query to the subexpression 
Q. A formula ipy is identified by the name of the view relation V, assuming without loss 
of generality that each view name is introduced only once by a let expression across the 
entire query. The formulae ipy will be defined below. 
For the operations 9 of relational algebra, 

ar(6) 

[0(Qi, . . . , Q ar( 8))lso(R, Rq) ■■= 3R Ql ■ ■ ■ Rq„ w ( A [QiU& R Qo 

i=l 

A W Rq(x) & <f>6(Qi,...,Q ar(e) )(.x) 

where < ar{9) < 2 and <f>s(x) := S(x), where S is either a relation from A or a second- 
order variable from R, 4>^{x) := x = t, (J)q 1 vjq 2 {x) := Rq ± (x) V Rq 2 (x), <t) Ql -Q 2 (x) : = 
R Qi (x)A^Rq 2 (x), (f>Q 1 xQ 2 (x,y) := R Ql {x) A Rq 2 (y) , (p^ iQ) (x) := Rq(x)Aj, ^ 3 {q){x) := 
3y Rq(x, y), and 4>p 3 ^^(Q){v) := 3x Rq(x) A x = y . It is easy to verify that for any tuple x 
and relational algebra operation 6, (A, Rq 1 ,. ■ ■ , RQ ar{e) ) 1= ( Pe(Q 1 ,...,Q ar(g) )(x) if and only if x 
is a result tuple of relational algebra query 6(Rq 1 , . . . , RQ ar , g) )- Assume that the induction 
hypothesis holds for the subqueries Q\, . . . , Q ar {e)i i- e -> (A, R, Rq % ) 1= [Qilso if and only if 

RQ t e [Qiliv for 1 < i < ar(9). The formula [#(Qi, . . . , Qar(0))}so just states that Rq is 
a relation consisting of exactly those tuples x that satisfy 4 > e(Q 1 ,...,Q ar(e) ){x) for a choice of 

possible results RQ i G [Qilvt/ °^ * ne subqueries Qi, for 1 < i < ar(9). But this is exactly 
the definition of [0(Qi, . . . , Q ar (e))fw • 
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This in particular covers the miliary operations of relational algebra ({£} and R), which 
form the induction start. 

The remaining operations are those special to WSA (with definitions): 

[subset (QJU&Rq) ■= ^Rq 1 IQi]so(R,R Qi )/\Rq C R Ql 

[repair-key 1 (Q 1 )] so (i?, R Q ) := 3R Ql [Qi] so (i?, R Ql ) A R Q C R Ql 

A A is a key for Rq 

A -'3i?Q Rq C R'qC R Ql A A is a key for R' Q 

[let V := Qi in Q 2 ] so (i?, Rq) := 3F Vv A [Q 2 ]«,(£, V, i? Q ) 

and define i/v := [Qi]so(-R, V) 
[possible^(Qi)] so (i?,i?Q) := 3i? Ql [Qi] so (l?, i? Ql ) A Vx i? Q (x) ^ 

3i?(( A 1>v) 

V uiR 

abr'^^u&r'qJ 

Air A (R Ql ) = tt a (R' Qi ) A R' Qi (x)] 

where U A is a key for R" and vr^(-) = vr^(-) are easily expressible in FO. 

It is straightforward to verify the correctness of [-] so for subset and repair- key: The 
definitions of [-] so and [•] essentially coincide. 

Similarly, the correctness of the definition of [-] so for let is easy to verify. Here we also 
define the formulae ij)y. 

Finally, [possible^(Qi)] so makes reference to world-set W and for that purpose uses 
the formulae ipy'- Indeed, the worlds in W are exactly those structures that satisfy all the 
ipy for relations V defined by let expressions on the path from the root of the query to the 
current subexpression possible^(Qi). The definition [possible^(Qi)] so is again very close 
to the definition of [possible^(Qi)], and its correctness is straightforward to verify. 

Note that by eliminating the definitions ipy we in general obtain an exponential-size 
formula. □ 

5 Intermezzo: Why we are not done 

The proof that WSA with definitions can express any SO query may seem to settle the 
expressiveness question for our language. However, understanding WSA without definitions 
is also important, for two reasons. First, it is a commonplace and desirable property 
of query algebras that they be compositional, i.e., that the power to define views is not 
needed for the expressive power, and all views can be eliminated by composing the query. 
Second, if this property does not hold, it means that in general we have to precompute and 
materialize views. And indeed, superficially we would expect that WSA is not compositional 
in that respect: it supports nondeterministic operations (repair-key and/or subset). If a 
view definition V contains such a nondeterministic operation and a query uses V at least 
twice, replacing each occurrence with the definition will not be equivalent because the two 
copies of the definition of V will produce different relations in some worlds. For example, 
(let V := subset(C7) in V XI V) is not at all equivalent to subset(CZ) cxi subset(CZ). 
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The question remains whether for each WSA query there is an equivalent query in WSA 
without definitions via a less direct rewriting. The answer to this question is less obvious. 
Our language definition has assumed repair-key to be the base operation and subset de- 
finable using WSA with repair-key. Indeed, in WSA with definitions, either one can be 
defined using the other. However, it can be shown that repair-key cannot be expressed 
using subset without using definitions even though subset can guess subsets and appears 
comparable in expressiveness to repair-key. 

Consider possible worlds databases in which each relation is independent from the other 
relations, i.e., the world set is of the form 

{(R u ...,R k ) \R 1 eW 1 ,...,R k eW k }. 

WSA without definitions on such relation-independent databases gives rise to a much simpler 
and more intuitive semantics definition than the one of Section El via the following function 

Hndef- 

{OjndefiWi, . . • , W or (0)) := {0(Ri, ■ ■ ■ , R ar (e)) I #1 e Wi, . . . , R ar (e) G W ar ^} 

. . . where 6 is an operation of relational algebra 

[repair-key^]^/^) := {R \ R C R' e W, n A (R) = tt a (R'),A is a key for R} 

[subset] n(fe/ (W0 := {R | R C R' g W} 

[possible^/ (^) := { \J{R' G W | n A (R) = tt a (R')} \ R G w] 
The correctness of this alternative semantics definition, stated next, is easy to verify. 

Proposition 5.1 For relation-independent databases and WSA without definitions, \-\ n def 
is equivalent to [■] in the sense that for any operation 6, 

{[0(Qi, . . . , Qar{e))\-w \AeW} = {e\ ndef {W u ..., W ar{9) ) 

where W { = \\{\Q^ \ A G W} for all 1 < i < ar{6). 

The following result asserts that adding subset to relational algebra yields little expres- 
sive power. By the existence of supremum of a set of worlds W, we assert the existence of 
an element ((J W) G W, denoted sup(VF). An infimum is a set inf(W r ) := (f) W) G W. 

Theorem 5.2 Any world-set computable using relational algebra extended by the operation 
subset has a supremum and an infimum. 

Proof. The miliary relational algebra expressions ({£} and R) yield just a singleton world- 
set, and the single world is both the supremum and the infimum. Given a world-set 
W, sup([subset] n( fe/(W^)) := sup(H^) and inf( [subset] n def(W)) := 0. For a positive re- 
lational algebra expression 6, sup([0j n ^ e /(Wi, . . . ,W k )) '■= 0(sup(Wi), . . . ,sup(Wk)) and 
inf([0] n de/(Wi, . . . , Wk)) := 0(inf(Wi), . . . ,inf(Wfc)). For relational difference, it can be 
verified that sup([-]nde/(^i,^2)) := sup(Wi) - inf(W 2 ) and mi{\-} ndef {W l ,W 2 )) := 
inf(Wi) — sup(W2)- It is easy to verify the correctness of these definitions, and together 
they yield the theorem. □ 

Thus, not even repair-keyg({0, 1}) = {{0}, {1}} can be defined. 
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Corollary 5.3 The set of worlds {{0},{1}} is not definable in relational algebra extended 
by subset. 

In contrast, repair-key sc/l /m(£/ x {0, 1}) can be denned as follows in the language frag- 
ment of relational algebra plus subset if definitions are available: 

let R := subset(C/) in (R x {1} U (U - R) X {0}). 

Thus, removing definitions seems to cause a substantial reduction of expressive power. 
In the remainder of this paper, we study whether possible r and repair-key can offset this. 

Before we move on, another simple result shall be stated that gives an intuition for the 
apparent weakness of WSA without definitions. If a view is defined by a query that involves 
one of the nondeterministic operations (possible^* or repair- key), then this view can only 
be used at one place in the query if the query is to be composed with the view. However, 
subsequent relational algebra operations will be monotonic with respect to that view. 

Proposition 5.4 Let Q be a nonmonotonic relational algebra query that is built using a 
relation R and constant relations. Then R occurs at least twice in Q. 

Proof. Assume a relational algebra query tree exists that expresses Q and in which R only 
occurs as a single leaf. Then the path from that leaf towards the root operation consists of 
unary operations and operations Q\ 9 Q2 where Q\ contains R and Q2 has only constant 
relations as leaves: Q2 is constant. So Q\ 9 Q2 can be thought of as a unary operation. But 
all unary operations 9 are monotonic, i.e., if X C Y, then 9{X) D 9(Y) for the family of 
operations (C — X) c const . tSC h(C)=sch(X) an d 9(X) C 9{Y) for all other operations. It follows 
that Q, a sequence of such operations, is also monotonic. □ 

6 WSA without Definitions Expresses all of SO Logic 

As the main technical result of the paper, we now show that WSA without definitions 
(but using repair-key as in our language definition), captures all of SO. It follows that 
definitions, despite our nondeterministic operations, do not add power to the language. 
This is surprising given Theorem 15.21 

6.1 Indicator Relations 

Let U be a nonempty relation (the universe) and let R C U. Then the indicator function 
1r : U — ► {0, 1} is defined as 

( 1 ... x <ER 
1r:X ^\0 ... x?R 

The corresponding indicator relation is just the relation {{x, 1r(x)) | x € U} which, obvi- 
ously, has functional dependency U — > {0, 1}. Subsequently, we will always use indicator 
relations rather than indicator functions and will denote them by \r as well. By our 
assumption that U 7^ 0, indicator relations are always nonempty. 

Given relations R and U with R C U 7^ 0, the indicator relation 1r w.r.t. universe U 
can be constructed in relational algebra as 

ind(i?, If) := (R x {1}) u (([/ - R) x {0}). 
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The expression repair- key sch tjj\(U x {0, 1}) is equivalent to 

let R := subset(£7) in md(R, U) 

and yields an indicator relation in each possible world. 

Indicator relations have the nice property that their complement can be computed using 
a conjunctive query (with an inequality), 

lu- R = (Ux {0, 1}) - 1 R := n 1 , 2 {<ri=3A2M U x i°> 1} x 1 r))- 

Let R denote the complement of relation R and let C7j = R4U R4, called the universe of 
Ri. Note that 

k 
Ri x • • • x Rk = I) U\ x • • • x C/j_i x Ri x J/j+i x • • • x J7 fc . 
i=i 

The complement of a product 1 := 1^ 1 x • • • x l Rk can be obtained as 

comply ,...^(1) = (C/i X {0, 1} X ■ ■ ■ X 17* X {0, 1}) - 1 

= K Au B u ...,A k ,B k (cr\/.(A i =AlAB i ^Bl)(pA' 1 B[...A' k B' k (l) X 
PAiBL.^BkiUi X {0, 1} X ••• X U k X {0,1}))). 

if, for each 1 < i < k, Ui is the universe of Ri. Moreover, 

k times 
. * . 



□ 



Lemma 6.1 The k-tivn.es product of Ir, denoted by (liOjy := ljj x • • • x 1r, can be ex- 
pressed as a relational algebra expression in which 1r only occurs once. 

Proof. Let U be the universe of R. 

0-r)u = PA lBl ...A k B k ((U x {0, l}) k ) - comply (1* ) 
= PA lBl ...A k B k ((U X {0,l}) fc ) 
- nA 1 ,B 1 ,...,A k ,B k {<ry i < i < k (A 1 =A'AB i jLB')( 

pA lBl ...A k B k ((U X {0, l}) fc ) X p A , B .(l R ))). 

As a convention, let S° = {()} for nonempty relations S. In particular, (1r)(/ = {()}• 

6.2 The Quantifier-Free Case 

By quantifier-free formulae we will denote formulae of predicate logic that have neither 
first- nor second-order quantifiers. 

Lemma 6.2 Let <fr be a quantifier-free formula with relations R. Then <p can be translated 
in linear time into a formula 3x a A (3, where a is a Boolean combination of equalities and 
P is a conjunction of relational literals, which is equivalent to <f> on structures in which each 
relation of R and its complement are nonempty. 
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Proof Sketch. Let R\, ... , R s the set of distinct predicates (relation names) occurring in 
(j). First push negations in (j) down to the atomic formulae using De Morgan's laws and the 
elimination of double negation and replace relational atomic formulae -iRj(t), where t is a 
tuple of variables and constants, by Rj(i). 

Now apply the following translation inductively bottom- up. The translation is the 
identity on inequality literals. Rewrite atomic formulae Rj(i) into 3vj\ Vj\ = t A Rj(vji) 
and atoms Rj{i) into 3wj\ Wji = t A Rj(wji). Let 

m m' 

1j,m,m> = f\ Rj(Vjk) A f\ Rj(w jk ). 
fe=l fc=l 

A subformula ip\ V i\>2 (resp., ipi A ^2) with 

s 

4>i = 3vw at A /\ lj,n ih n' i:j 
3=1 



is turned into 



3vw a A A 7j ; 



iru.mj 



where mj = max(nij,n2j), m'- = max(n' 1 ji n 2j) an d Q = Qi V «2 (resp., rrij = n\j + ri2j, 
m'j = n'ij + n' 2 j, a = a\ A a' 2 , and a' 2 is obtained from 02 by replacing each variable Vjm by 
«i(fe+ny)i and each variable u; jfc i by w i(A . +nij )0- 

For the equivalence of the rewritten formula to 4>, it is only necessary to point out that 
since all the relations Rj and Rj are nonempty, ipi is equivalent to 



3vw at A /\ 7 jV , 



m, ,m '. • 



i=i 

It is not hard to verify that the translation can indeed be implemented to run in linear 
time and that the rewritten formula is of the form claimed in the lemma. □ 

Theorem 6.3 For any quantifier-free formula there is an equivalent expression in WSA 
over universe relations and indicator relations in which each indicator relation only occurs 
once. 

Proof Sketch. Assume R\,...,R S are all the predicates occurring in the formula. By 
Lemma 16.21 we only need to consider formulae of syntax 

s rnj m' 3 

(j) = 3vw a A f\ f\ Rj(vjk) A f\ ^Rj(w jk ) 

j=lk=l k=l 

where a does not contain relational atoms if each relation Rj is nonempty and different 
from Uj. Such a formula 4> is equivalent to 

s rrij m'j 

3vwtt' a A f\ f\ iR^Vj^tjk) A f\ l Rj (w jk ,t' jk ) 

j=lk=l k=l 
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with 

s rrij rn' 3 

a , = aA/\(/\t jk = lAf\t' jk = 

j=i fc=i fc=i 

This is true because Rj(vjk) is equivalent to 1^.(^,1) and ^Rj(iVj k ) is equivalent to 
Ir^WjIcO)- Obtaining formulae of this form is indeed feasible because 1r. ^ and 1r j ^ 

Let x be the free variables of the formula. The WSA expression is 

7r :l (a a >(B 1 x ••• x B 8 )) 
with 

1 1 \ rn 3 + m 'j \ 

B j := Pvjit j i...v jm .t jm .w jl t' jl ...w :JrH ft' {{iRjJUj )■ 

Each Sj computes an (rrij + m')-times product of 1r using the technique of Lemma 16. II 
which just uses one occurrence of 1^.. All the relations 1r only occur once. This proves 
the theorem. □ 

Example 6.4 Consider an alternative encoding of 3-colorability in WSA which is based 
on guessing a subset of relation U = V x pc({r,g,b}). Then 3-colorability is the problem 
of deciding the SO sentence 3C(C U) ->3v, w, c, d 4>\ V 02 V 03 with <j)\ = E(v, w) A C(v, c) A 
C(w,c), 4>2 = C(v,c) A C(v,c') Ac / c', and 3 = -iC(v,r) A ~^C(v,g) A -iC(u,6), i.e., 0i 
asserts that two neighboring nodes have the same color, 02 that a node has simultaneously 
two colors, and 03 that a node has not been assigned any color at all. If neither is the case, 
we have a 3-coloring of the graph. Using Theorem 16.31 0i V 02 V 03 becomes 

■k = (ipx V-02 V if) 3 ) A lc(ui,ci,ti) A lc(u2,C2,t 2 ) A l c (u 3 ,c 3 ,t 3 ) M E (v,W,U) 

where 

Ipl = U\ = V A U2 = W A C\ = C2 A t\ = t2 = £4 = 1 

■02 = m = U 2 A Ci / C 2 A £1 = t 2 = 1 

■03 = «i = «2 = «3 A ci = r A c 2 = g A c 3 = 6 A *i = t 2 = % = 0; 

Following Theorem 16.31 formula ir can be turned into WSA as 

Qw '■= CF 4>iV4)2V4) 3 {Pu 1 c 1 t 1 U2C2t2U S C3t3{\lc)vx{r,g,b}) X PvwU\E)) 

where (lc)y x r rob i denotes the WSA expression for lp x lp x lc from Lemma [6TTI 
The complete SO sentence can be stated as 

31c (lc -V x {r, g, b} -> {0, 1}) A ~Buic 1 t 1 U2C2t2U 3 c 3 t 3 vwt4 n. 

If lc in Qtt is replaced by repair-keyyc(V x pc({ r ,9,b}) x />r({0, 1})), this sentence can 
be turned into WSA without definitions as possible({(}} — n^Q^)). □ 
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6.3 Quantification and Alternation 

Conceptually, in SO, there is no difference in the treatment of second-order variables and 
relations coming from the input structure; an existential second-order quantifier extends 
the structure over which the formula is evaluated. In our algebra, however, we have to 
construct the possible alternative relations for a second-order variable R at the beginning 
of the bottom-up evaluation of the algebra expression using repair-key and have to later 
test the existential quantifier 3R using the possible operation grouping the possible worlds 
that agree on R. For that we have to keep R around during the evaluation of the algebra 
expression. Selections also must not actually remove tuples because this would mean that 
the information about which world the tuple is missing from would be lost. For example, 
the algebra expression corresponding to a Boolean formula must not return false, but in 
some form must compute the pair {R, false). 

Let (j) be an SO formula with free second-order variables R±, . . . , i?& and free first-order 
variables x±, . . . ,X[. Conceptually, our proofs will produce a WSA expression for <f> that 
computes, in each possible world identified by choices of relations R\, . . . ,Rk for the free 
second-order variables, the relation 

R\ x • • • x Rk x 6 

where O is a representation of a mapping 

a i— > truth value of 4>[x replaced by a]. 

Truth and falsity cannot be just represented by 1 and 0, respectively, because an existential 
first-order quantifier will effect a projection on O whose result may contain both truth 
values 1 and for a variable assignment a. Thus, projection may map environments for 
which (p is true together with environments for which eft is false. In that case we would like 
to remove the tuples for which the truth value encoding is 0. Unfortunately, the function 




is nonmonotonic, and by Proposition 15.41 cannot be expressed in relational algebra if the 
input relation is to occur in the query only once. Fortunately, we do not need such a 
function F. 

Definition 6.5 A PBIT (protected bit) is either {_L} (denoting 0) or {_L, 1} (denoting 1). 

Given a Boolean query Q (i.e., Q returns either {()} or 0), 

PBIT(Q):=(Qx{l})u{±}. 

The negation of PBIT B is obtained by {_!_, l}-(5fl {1})- The set union on PBITs effects 
a logical OR, thus a relation Cilx PBIT for which (a, 1} 6 R implies (a, _L) 6 R guar- 
antees that projecting away a column other than the rightmost corresponds to existential 
quantification. 

For an SO formula (j) with free second-order variables R\ , . . . , Rk and free first-order 
variables x±, . . . , xi, we will define a WSA expression that computes the relation 

TT{4>) := l Rl x • • • x l Rk x G 
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such that = (D x {_!_}) U {(a, 1} | (p[x replaced by a] is true} and D is a domain relation 
containing the possible values for the first-order variables. (So can be thought of as a 
mapping D — > PBIT.) The complement of such a relation is 

comply (0) := D l x {J_, 1} - cr r= i(0). 

Next we obtain an auxiliary construction for complementing a relation while passing 
on the second-order relation. This will be the essential tool for alternation. 

Lemma 6.6 Let P = 1^ x • • • x l Rk x where C D\ x • • • x D\ x PBIT. There is a 
WSA expression without definitions for 

comply ^ fc .^ T (S) := l Rl x • • • x l Rk x compl(0) 

in which P only occurs once. 

Proof. Let sch(Ui) = A\ and sch(l Ri ) = AiB{. We write 1 for l Rl x • • • x l Rk and U + for 
U\ X • • • X t/fe X PB 1 ...B k ({0, l} fc )- A definition of comply jj (1) was given in Section [67TI 

comply _ Uk . AT (l x 0) = 1 x (D l x {J_, 1} - a T=1 (0)) 

= (C/+ x D 1 x />r({±, 1})) 

-comply, ,. )[/fc (l) xD'x p T ({J_, 1}) 

-?7+ X CT T =l(0) 

= (?7 + x J D'x /0T ({J.,l})) 

-7rA 1 ,B 1 ,..., J 4 fc ,B fc ,T(0"V I (^=^AB»^)VT'=T=l( 

U + x Pa' 1 bJ...a^b^'(^3) x Pt({^,1})))- 

p 

The final WSA expression is in the desired form. □ 

Now we are ready to prove the main result of this section. 

Theorem 6.7 Given a formula in second-order logic, an equivalent WSA expression with- 
out definitions can be computed in linear time in the size of the formula. 

Proof Sketch. The proof is by induction. Given second-order formula (j) with free first- 
order variables x and zero or more free second-order variables. 

Induction start: Assume that cf) is quantifier-free. Consider the quantifier-free formula 

4>(x,y,t) := ( /\ R j (y j )^A(^Vt = ±), 

j: Rj is an SO var. 

where the variables y and t are new and do not occur in <p. It is easy to verify that tp 
defines the relation TT{(p). Specifically, the projection down to columns yj represents the 
free second-order variable Rj, the projection down to columns x specifies all the possible 
assignments to the first-order variables, and t is a PBIT for the truth value of <fi for a given 
assignment to the first- and second-order variables. The corresponding WSA expression 
without definitions is obtained using Theorem 16.31 

Induction step (cf> has quantifiers): We assume that universal quantifiers V- have been 
replaced by -i3 • -i. Let P be the WSA expression for if) claimed by the theorem. 
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• First-order existential quantification: If (ft = 3xi ift, the corresponding WSA expression 
is 7r sch(P)~xi(P)- It is easy to verify that the projection has exactly the effect of 
existential first-order quantification, TT(3xi ift) = ^sch(P)-xi(TT(ip)). 

• Second-order existential quantification: Let R±, . . . ,R S be the free second-order vari- 
ables in i\). We may assume w.l.o.g. that these have disjoint schemas. If (ft = 3R S ift, 
the corresponding WSA expression is 

7T sch{P)-sch(R J )(possible sch( ^ Rl) v .. )SC / l (fl s _ 1 )(-P))- 

Again, the correctness is straightforward, TT(3R S ift) = Tt s di{P)-sch(R){TT(ift)). 

• Negation: By Lemma EU the WSA expression comply v ^ T (P) is equivalent to 

(ft = —lift. 

All that is left to be done is to provide WSA expressions for the indicator relations 1r . . 
For database relations Rj, the algebra expression is ind(Rj, Uj). For second-order variables 
Rj, it is repair-key sch([/j) (C7 x {0, 1}). 

For an SO sentence (ft (i.e., without free variables), the algebra expression computes a 
PBIT TT{(ft) and its truth value is obtained as tt%{ctt=i{-))- d 

Example 6.8 We continue Example 14.21 Let 

(ft = (L(c,p,0) A (A (p) V P 2 (p))) V (L(c,p, 1) A -^(P 1 (p) V P 2 (p))). 

Then S2-QBF can be expressed by the SO sentence 

3Pi(C VI) -3P 2 (C V 2 ) -»3c(e C) ^3 P (ft. 

We can turn 

((ftVt = ±) AP 1 { Pl2 )AP 2 (p22) 

into WSA over indicator relations as 

Q = ^(p cps t L (lL) X /5p 11 t llPl2 t 12 ((lp 1 )v\) X Pp2lt 2 lP22t22(( 1 P2)v 2 ) X ^({-M})) 

where 1ft = (t = ± V (t L = 1 A p = pn = p 2 i A ((s = A (t u = 1 V t 21 = 1)) V (s = 
1 A in 7^ 1 A t 2 i 7^ 1)))) • Note that we have simplified the expression of the proof somewhat 
by inlining the auxiliary variables v and w. 

The complete WSA expression for the SO sentence is 

PBIT to bool 3 fi 7 3 f 2 



vr ocr i= i ovr t o possible o comply. r o7r Pl2il2t opossible pi2tl2 o 

comply ■ V . T °TT Pl2 t 1 2P22t22t ° comply VCT o7r Pl2tl2P22t22C t( Q ). 

* „ ' * v ' * . ' ' v ' S -V-' 

-1 3c -1 3p 4> 

We replace 1l by ind(L, •) and \p i by repair-key p (p p t(Vj x {0, 1})). □ 

Thus, definitions add no power to WSA. 
Corollary 6.9 WSA without definitions captures WSA. 
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The data complexity of a query language refers to the problem of evaluating queries on 
databases assuming the queries fixed and only the database part of the input, while com- 
bined complexity assumes that both the query and the database are part of the input |15j . 
Since SO logic is complete for the polynomial hierarchy (PHIER) with respect to data com- 
plexity and PSPACE-complete with respect to combined complexity [14J, a generalization 
of Fagin's Theorem [6] (see also [TT]). 

Corollary 6.10 1. WSA with or without definitions is PHIER- complete with respect to 
data complexity, 

2. WSA with definitions is PSPACE-hard with respect to combined complexity, and 

3. WSA without definitions is PSPACE-complete with respect to combined complexity. 

We cannot directly conclude an upper bound on the combined complexity of WSA 
with definitions from the reduction of Theorem 14.31 because it was exponential-time: In 
the case that WSA definitions are used, several copies of formulae ipy may be used in the 
SO formula constructed in the proof, and that recursively. However, we can think of the 
proof construction as a linear-time mapping from WSA with definitions to second-order 
logic with definitions. But the standard PSPACE algorithm for second-order logic extends 
directly to second-order logic with definitions: Of the formula, we only have to maintain a 
current path in its parse tree, which is clearly of polynomial size. It follows that 

Proposition 6.11 WSA with definitions is PSPACE-complete with respect to combined 
complexity. 

7 Related Work 

In an early piece of related work, Libkin and Wong [12J define a query algebra for handling 
both nested data types and uncertainty. Their notion of uncertainty called or-sets (as a 
generalization of the or-sets of [8]) is treated as a special collection type that can syntacti- 
cally be thought of as a set of data and is only interpreted as uncertainty on an additional 
"conceptual level". The result is a very elegant and clean algebra that nicely combines 
complex objects with uncertainty. While their language is stronger and can manage nested 
data, there is nevertheless a close connection to WSA, which can be thought of as a flat 
relational version of their language. Indeed, the or-set language contains an operator a 
that is essentially equivalent to the repair-key operator of WSA. 

TriQL, the query language of the Trio project [E], subsumes the power of relational 
algebra and supports an operation "groupalts" which expresses the repair-key operation of 
WSA applied to a certain relation. There are many more operations in TriQL, but it is hard 
to tell whether possible ? is expressible in TriQL since no formal semantics of the language 
is available. Moreover, TriQL contains a number of representation-dependent (non-generic 
PQ) operations which may return semantically different results for different semantically 
equivalent representations of a probabilistic database. This makes TriQL hard to study 
and compare with WSA. However, it seems that WSA is a good candidate for a clean core 
to TriQL, and the results of the present paper provide additional evidence that it is highly 
expressive. 

The probabilistic databases definable using repair-key from certain relations are also 
exactly the block independent-disjoint (BID) tables of Re and Suciu. In their paper [13], 
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they study the related representability problems for BID tables. Their results suggest 
that BID tables are more powerful than tuple-independent tables, which correspond to 
uncertain tables definable using the subset operation. This is in line with observations 
made in Section [5] of the present paper. 

The algebra defined in our own earlier work [3] is exactly the one described in the present 
paper, modulo the following details. Most importantly, while repair-key is introduced there 
as part of the algebra, most of the paper focuses on the fragment that is obtained by 
replacing repair-key by choice-of. Moreover, the syntax of possible^ allows for the grouping 
of worlds by a query Q that can be given as a parameter; the syntax is possible^Q')- 
An operation possible^- in the syntax of the present paper corresponds to an operation 
possible^ _ in the syntax of [3]. The results of this paper imply that allowing general 
queries Q for grouping adds no power, so we are indeed studying the same language. The 
paper [3] also gives an SQL-like syntax for WSA, in which the intuition of possible^ is made 
explicit by the syntax "select possible . . . group worlds by . . . " . 

In recent work [21 [TUt [9] , we have developed efficient techniques for processing a large 
part of WSA. The only operations that currently defy good solutions are possible^ (i.e., with 
grouping, not possibleg) and, to a lesser extent, relational difference. Indeed, the repair- key 
operator on the standard representations described in Example 12.11 can be implemented 
efficiently, even though semantically it generally causes an exponential blowup in the size 
of the set of possible worlds. Thus, it is natural to ask for the expressive power of WSA 
with possible ? replaced by possible. The construction of the proof of Theorem 14. 1 1 can map 
any SO formula of the form 3R <j) or MR <ft where 4> ls FO to WSA. It is not hard to see 
that despite the restriction to a single second-order quantifier, this fragment of WSA (with 
definitions) can express all of NP U co-NP. For an upper bound, it seems that all such 
restricted WSA queries have data complexity in Af (i.e., P NF ). 

8 Conclusions 

The main contribution of this paper is to give the apparently first compositional algebra 
that exactly captures second-order logic over finite structures, a logic of wide interest. 

Second-order logic is a natural yardstick for the expressiveness of query languages for 
uncertain databases. It is an elegant and well-studied formalism that naturally captures 
what-if queries. It can be argued that second-order logic takes the same role in uncertain 
databases that first-order logic and relational algebra take in classical relational databases. 
In that sense, the expressiveness result of this paper, WSA = SO, is an uncertain databases 
analog of Codd's Theorem. 

Finding the right query algebra for uncertain databases is important because efficient 
query processing techniques are easier to obtain for algebraic languages without variables 
or quantifiers, and algebraic operators are natural building blocks for database query plans. 
Of course, the expressiveness result of this paper also implies that WSA has high complexity 
and thus this paper can only be an initial call for the search for more efficiently processible 
fragments of WSA that retain some of its flavor of simplicity and cleanliness. 
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